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Cone-beam computed tomography (CBCT) is mostly used for position verification during the treatment pro- 
cess. However, severe image artifacts in CBCT hinder its direct use in dose calculation and adaptive radiation 
therapy re-planning for proton therapy. In this study, an improved U-Net neural network named CBAM-U-Net 
was proposed for CBCT noise reduction in proton therapy, which is a CBCT denoised U-Net network with con- 
volutional block attention modules. The datasets contained 20 groups of head and neck images. The CT images 
were registered to CBCT images as ground truth. The original CBCT denoised U-Net network, sCTU-Net, was 
trained for model performance comparison. The synthetic CT(SCT) images generated by CBAM-U-Net and the 
original sCTU-Net are called CBAM-SCT and U-Net-SCT images, respectively. The HU accuracies of the CT, 
CBCT, and SCT images were compared using four metrics: mean absolute error (MAE), root mean square error 
(RMSE), peak signal-to-noise ratio (PSNR), and structure similarity index measure (SSIM). The mean values of 
the MAE, RMSE, PSNR, and SSIM of CBAM-SCT images were 23.80 HU, 64.63 HU, 52.27 dB, and 0.9919, 
respectively, which were superior to those of the U-Net-SCT images. To evaluate dosimetric accuracy, the range 
accuracy was compared for a single-energy proton beam. The 7y-index pass rates of a 4 cm x 4 cm scanned 
field and simple plan were calculated to compare the effects of the noise reduction capabilities of the original 
U-Net and CBAM-U-Net on the dose calculation results. CBAM-U-Net reduced noise more effectively than 
sCTU-Net, particularly in high-density tissues. We proposed a CBAM-U-Net model for CBCT noise reduction 
in proton therapy. Owing to the excellent noise reduction capabilities of CBAM-U-Net, the proposed model 
provided relatively explicit information regarding patient tissues. Moreover, it can be used in dose calculation 


and adaptive treatment planning in the future. 
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I. INTRODUCTION 


Radiotherapy is among the most efficient cancer treat- 
ments, and approximately 80% of cancer patients require ra- 
diotherapy during treatment[1]. In recent years, proton ther- 
apy has gained increasing interest owing to its superior phys- 
ical properties compared to conventional radiotherapy[2—7]. 
The dose of the proton beams increases slowly with depth 
within the entrance region and drastically increases at the end 
of the range, forming a Bragg peak. Thereafter, it declines 
rapidly in the distal fall-off region[8—10]. Because of the ad- 
vantage of the Bragg peak in the physical dose, a proton beam 
can accurately deliver a dose to the target volume and improve 
dose distribution and target volume conformality. This fur- 
ther protects patients and reduces damage to adjacent normal 
tissue caused by radiotherapy. Therefore, proton therapy is 
superior to traditional photon therapy for the treatment of tu- 
mors located close to organs at risk (OAR)[11—13]. Owing to 
the distinctive dose distribution of proton beams and the high 
sensitivity of the proton beam range to Hounsfield Unit(HU) 
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values, proton therapy requires more frequent imaging infor- 
mation to improve the effect of therapy and reduce physical 
uncertainties[14, 15]. 


Owing to the rapid development of radiation and medi- 
cal imaging technologies, the era of image-guided therapy is 
constantly developing in the field of radiotherapy. Utilizing 
a vast array of applicable and effective technologies ensures 
that the radiation dose corresponds to the anatomical structure 
of the radiation target to the greatest extent possible, thereby 
improving the treatment quality [16-18]. Cone-beam com- 
puted tomography (CBCT) has become among the most im- 
portant components of image guidance equipment in the field 
of medical image guidance technology because of its multiple 
advantages, such as short scanning time, high spatial resolu- 
tion, low exposure dose, and the ability to be located at the 
treatment site[19, 20]. CBCT is typically used either daily 
or weekly to verify patient position and monitor the patient’s 
anatomical structural changes. The patient need not be moved 
to avoid errors in positioning over the course of treatment. 
However, CBCT is a three-dimensional image reconstructed 
from two-dimensional projection images, and its scattering 
noise artifacts are more severe than those produced by con- 
ventional fan-beam CT (FBCT). Moreover, the CBCT imag- 
ing method is not promising for clinical dose calculations. 
However, CBCT is typically performed on patients during ra- 
diation treatment, although the CBCT images contain severe 
artifacts. The relatively more accurate results of dose calcu- 


lations based on CBCT images would provide more medical 
information about the anatomical changes in patients [21—24]. 


In recent years, many studies have been conducted to re- 
duce the noise in CBCT images. Theoretically, the HU values 
in CBCT images can be recovered by deforming the plan- 
ning CT(pCT) images using deformable image registration 
(DIR)[25—28]. However, the DIR-based methods are com- 
plicated and require careful evaluation. Once the anatomical 
structure of the patient changes significantly, the accuracy of 
DIR for CBCT image improvement is limited [29]. The pro- 
cess of generating an image using DIR can take a few min- 
utes to complete. The other frequently used method is the 
intensity correction method, which scales CBCT image in- 
tensities to the HU range of pCT images using population- 
based lookup tables. However, the limitations of this method 
are intrinsic to CBCT, specifically the shadowing effect ar- 
tifacts, which cannot be corrected by the intensity correc- 
tion method[30]. The application of the Monte Carlo (MC) 
method for CBCT noise reduction has also been investi- 
gated. MC-based correction techniques, which may require 
hours for calculation, are not suitable for clinical applica- 
tions in adaptive radiotherapy[31], although this method has 
good performance, and certain GPU-based MC dose calcu- 
lation methods have reduced computational cost.[32] With 
the development of computer technology and artificial intel- 
ligence, the application of artificial intelligence technologies 
for CBCT noise reduction has garnered attention. The use of 
deep learning as a method for CBCT noise reduction offers 
various benefits, including excellent image quality, contin- 
ual learning, and quick computation after the model has been 
trained. CBCT and CT image noise-reduction technologies 
have been improved using techniques based on convolutional 
neural networks (CNNs)[33, 34]. Ronneberger et al. devel- 
oped a U-Net network architecture in 2015[35] that demon- 
strated exceptional performance in applications involving bi- 
ological segmentation. In 2020, based on U-Net, Chen et 
al. created sCTU-Net for CBCT noise reduction and real- 
ized the function of CBCT in CT. The sCTU-Net may enable 
improved CBCT technology to be employed in adaptive treat- 
ment planning[36]. 


This study developed a CBAM-U-Net network that inte- 
grates convolutional block attention modules, also known 
as CBAM blocks, into sCTU-Net down-sampling and up- 
sampling modules[37]. In this small module, the feature map 
was assigned different weights to increase the accuracy of the 
output results. The noise reduction performance of the pro- 
posed network was evaluated using the mean absolute error 
(MAE), root mean square error (RMSE), peak signal-to-noise 
ratio (PSNR), and structural similarity (SSIM). In addition to 
the traditional image evaluation, the IDD curve analysis of a 
single-energy proton beam and the y-index pass rate of the 
proton beam square field and simple plan were used to eval- 
uate the performance of the network in noise reduction. This 
technology will be used in the CBCT imaging system of the 
proton therapy center[38] at Ruijin Hospital after verification. 


Il. MATERIALS AND METHODS 
A. Dataset and image Preprocessing 


The preliminary data processing in this study was based on 
the analysis of the original data, which included 1477 head 
and neck CT slices and 1477 CBCT slices obtained from a 
cohort of 20 different patients. The original CBCT images 
were processed using the Varian iCBCT technology for noise 
reduction during image acquisition. The tube voltage for CT 
was 120 kVp and CBCT was 100 kVp. The voxel size of the 
CT images was 0.511 mm x 0.511 mm x 1.989 mm with 
an image resolution of 512512. Further, the voxel size of 
CBCT images was 0.976 mmx0.976 mm x3 mm with an im- 
age resolution of 512x512. 

The acquisition time of CBCT images is about one month 
after that of CT images. Owing to the different acquisition 
times of the CT and CBCT images, there was a minor differ- 
ence in deformation. Therefore, image registration between 
CT and CBCT images is required. The anatomical deforma- 
tion registration technique was selected in three-dimensional 
(3D) Slicer software and optimized using an adaptive stochas- 
tic gradient descent algorithm[39]. The CT images were reg- 
istered to the CBCT images to maintain the anatomical geom- 
etry consistent, and the registered CT images were resampled 
to a resolution of 0.976 mm x 0.976 mm x 3 mm, similar to 
the CBCT images. 

In this study, we strictly followed the registration results to 
produce CT-CBCT image pairs for training and testing. We 
randomly selected 1267 CT-CBCT image pairs from 17 pa- 
tients as the training datasets and 210 CBCT-CT image pairs 
from the remaining three patients as the testing datasets where 
each patient dataset contained approximately 70 pairs. 


B. Network Structure 


Figure 1 shows the network structure of CBAM-U-Net. 
The input data for the CBAM-U-Net network were CBCT 
images, and the output data were SCT images that were pro- 
cessed by the neural network. The first layer of the network 
was used to expand the original CBCT data to 32 channels. 
Using the down-sampling module, the height and width of the 
data were reduced to half of the input image, and the number 
of channels was doubled. After 6-layer down-sampling mod- 
ules, 8x8x1024 feature images were created. Contrary to 
the function of the downsampling module, the feature images 
were processed by the upsampling module, which doubled 
the feature image height and width and reduced the number 
of channels by half. The 8 x8 x 1024 feature images were up- 
sampled using 6-layer up-sampling modules, processing the 
feature image dimensions to 512 x 512 x 32. Finally, an 
SCT image with dimensions 512 x 512 x 1 was produced by 
the output module. The attention mechanism modules were 
integrated into both the downsampling and upsampling mod- 
ules. These modules enabled the network to allocate calcula- 
tion weights to the sampling modules in each layer, thereby 


enhancing the accuracy of the neural network outputs. 
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Fig. 1. Overall structure of the CBAM-U-Net network. 


The weight calculation of the feature image was executed 
in the sampling module of each layer to increase the output 
accuracy of the neural network. The module of the convolu- 
tional attention mechanism is illustrated in Figure 2. The 
convolutional block attention module comprised two sub- 
modules: the channel and spatial attention modules. Each 
feature image was assigned a weight parameter using a mod- 
ule called the channel attention mechanism module. The spa- 
tial attention mechanism module then processed the feature 
image by applying weight parameters to each pixel of the 
feature image. This process occurred during feature-image 
generation. To increase the accuracy of the HU values in the 
output results, the convolutional block attention module pro- 
vided additional weight parameters for certain feature images 
and areas within all feature images. 
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Fig. 2. Convolutional block attention module structure. 


A description of the channel attention module is presented 
in Figure 3. The input feature images were initially pro- 
cessed using an adaptive average pool, with the output having 
dimensions of 1 x 1 x C, where C is the total number of 
channels. The weight parameters were generated after the in- 
put data passed through two linear layers and a Sigmoid layer. 
The input image was multiplied by the weight parameters to 
produce the output at the end of the module for the channel- 
attention mechanism. During the training of a neural network, 
certain feature images significantly affected the quality of the 
ideal output SCT image. After passing through the channel- 
attention mechanism module, these significant feature images 
were assigned larger weights. 


Fig. 3. Channel attention mechanism module structure. 


Figure 4 shows the spatial attention mechanism module. 
Subsequent to the channel attention mechanism module, the 
spatial attention mechanism module accepted the output of 
the channel attention mechanism module as its input. The 
spatial attention mechanism module compressed the feature 
images into a one-channel image and processed the feature 
image using an average pooling layer and a maximum pool- 
ing layer. The image maps of the average and maximum pool- 
ing layers were concatenated and processed using a convolu- 
tion layer. The feature images were multiplied by the spatial 
weight to obtain the final output of the spatial attention mech- 
anism module. Therefore, the spatial attention mechanism 
module can assign higher weights to key areas of the feature 
images and lower weights to areas outside the key areas to 
improve the capability of the network. The channel and spa- 
tial attention mechanism modules both assigned calculation 
weights to the feature images of the network. Throughout the 
learning process, the calculation weights were continuously 
modified to generate improved SCT images. 
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Fig. 4. Spatial attention mechanism module structure. 


C. Network training parameter 


PyTorch was used as the computing framework, and L1loss 
(Equation 1) was chosen as the loss function. The model 
training engines were two NVIDIA RTX 2080Ti with 24GB 
of computational memory. The batch size was set to 8, and 
each GPU received 4 images as the input. The initial learn- 
ing rate was set at 1E-4, and there were 180 training epochs. 
Adam was used as the optimizer and StepLR was used as the 
learning rate adaptor. Training was performed on both the 
sCTU-Net and CBAM-U-Net models with the same training 
parameters, and the training weight parameters were saved 


for each network. 


1 n 
L1Loss($(a;), yi) = n 5 |S(xi) — yil (1) 
i=l 


where S(2;) is the pixel value of the generated SCT image, 
yi is the pixel value of the corresponding CT image, and n is 
the total number of pixels in the image. 


D. Image Evaluation 


To evaluate the noise reduction performance, the MAE, 
RMSE, PSNR, and SSIM were used. The ground truth im- 
ages were registered as CT images. The values of the data 
from the three testing datasets were analyzed using these four 
evaluation parameters, and the average values of the three 
testing datasets were calculated as the final assessment pa- 
rameter. 

The MAE and RMSE are expressed in Equations 2 and 3, 
respectively. Both are the most prevalent assessment criteria 
used in image processing and offer the advantages of being 
easily understood and quantifiable. The smaller the MAE and 
RMSE of the two images, the higher the similarity between 
them. The corresponding formulas are as follows: 
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1 
i=l 
1 m 
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where S(2;) is the pixel value of the generated SCT image, 
y; is the pixel value of the corresponding CT image, and m 
denotes the total number of image pixels. 

PSNR is defined as the ratio between the maximum possi- 
ble value of a signal and distorting noise power that affects 
the quality of its representation. The larger the PSNR, the 
higher is the similarity between the two images. The formula 
for PSNR is expressed as equation 4: 


Maz? 

PSNR = 10log;o( MSE 

where Maz; is the maximum value of the image pixel sam- 
pling point and MSE is the mean square error. 

SSIM is a statistic used to determine the overall similar- 
ity between two images. It is commonly used as a statistic 
to measure image generation and processing. The dynamic 
range of SSIM ranges from -1 to 1. The closer the SSIM is to 
1, the higher is the similarity in the total information content 
between the two images. SSIM was calculated using Equa- 
tion 5: 


) (4) 
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SSIM (x,y) = (5) 


where ju, is the mean of x, uy is the mean value of y, oz 
is the variance of x, gy is the variance of y, ozy is the covari- 
ance of x and y, C1=(k1 L)?, C2=(k2L)? is the constant used 
to maintain stability, and L is the maximum value of image 
pixels. Further, kı and k2 were 0.01 and 0.03, respectively. 

To show the matching degree between SCT, CBCT, and CT 
images in different HU value ranges. Q-Q (quantile-quantile) 
plots[40] between SCT and CT images and between CBCT 
and CT images from samples of the testing data were also 
plotted for detailed evaluation. Q-Q plots are commonly used 
in mathematical statistics to test the consistency of the distri- 
bution of two groups of data. The closer the two sets of data 
were distributed, the closer the Q-Q plot of the two groups of 
data was to 45° from the reference line. 


E. Evaluation of Dose calculation accuracy 


The matRad dose calculation engine, an open-source soft- 
ware developed by the German Cancer Research Center 
(DKFZ, Heidelberg) for scientific purposes of radiotherapy 
planning, was used to calculate the dose distribution of pro- 
ton beams, where the ray-tracing algorithm was used for dose 
calculation.[41] 

First, for the depth dose analysis, the IDD curve analysis of 
a proton beam with a nominal energy of 114.5 MeV is used 
in matRad. The dose grid was set as 1 mm x 1 mm x 1.5 
mm. A proton beam in air with a full width at half maximum 
(FWHM) of 5 mm was applied to the dose calculation model 
of the head and neck CT, CBCT, CBAM-SCT, and U-Net- 
SCT images of the patient testing datasets. Second, for the 
lateral dose profile analysis, CBCT, CBAM-SCT, and U-Net- 
SCT images were analyzed using a square-field y-index pass 
rate to compare the calculation results. The scanned field size 
was set to 4 cm x 4 cm, the spot spacing was 2 mm, and 
the square-field dose calculation employed proton beams with 
nominal energy of 114.5 MeV. Third, for further lateral dose 
analysis, a simple plan was also implemented, and under the 
same radiation conditions, the dose distribution of CT was 
used as a reference to compare their y-index pass rates. 


Ill. RESULTS 
A. Comparative analysis of CT images 


The network weight parameters of CBAM-U-Net and 
sCTU-Net were obtained under the same training condi- 
tions. The weight parameters produced by the sCTU-Net and 
CBAM-U-Net models were utilized to generate the SCT im- 
ages of the three patient datasets. The data obtained from 
the generation of SCT images were evaluated using the four 
indicators mentioned above. Figure 5 shows representative 
transversal images of CT, CBCT, and SCT from a patient. 
Both the sCTU-Net and CBAM-U-Net networks effectively 
reduced image noise and restored the original CT image fea- 
tures in regions with CBCT image artifacts. The quantitative 
analysis mean value data are presented in Table 1. 


Table 1. Four image indicators comparison between CBCT, SCT and CT images of three patients 


Image dataset MAE (HU) RMSE (HU) PSNR (dB) SSIM (a.u.) 
CBCT 39.3256 104.8787 48.3045 0.9809 
Patientl | U-Net-SCT 24.5496 65.5988 52.1678 0.9911 
CBAM-SCT 22.6519 57.1462 53.1665 0.9936 
CBCT 41.2931 111.9404 47.6491 0.9780 
Patient2| U-Net-SCT 25.4210 73.6275 51.0182 0.9892 
CBAM-SCT 23.1739 68.6911 51.6913 0.9909 
CBCT 47.71311 115.2558 47.6525 0.9783 
Patient3| U-Net-SCT 26.9864 76.3210 51.1504 0.9885 
CBAM-SCT 25.4442 67.2830 52.0463 0.9915 


Table 1 demonstrates that the U-Net-SCT and CBAM-SCT 
images were significantly superior in the four evaluation met- 
rics compared to the CBCT images. Moreover, CBAM-SCT 
images exhibited better image quality than the U-Net-SCT 
images. 


(d) 


Fig. 5. Typical transverse planes of (a) CT, (b) CBCT , (c) U-Net- 
SCT, and (d) CBAM-SCT images of patient. 


To demonstrate the CBAM-U-Net’s ability to reduce noise, 
error distribution maps were obtained by subtracting the CT 
images from CBCT and SCT images, as shown on the left 
side of Figure 6. The CBCT, and CBAM-SCT images had 
an MAE values of 23.68, 17.75, and 13.02 HU, respectively. 
The image errors generated by CBCT were worse than those 
generated by SCT, which mainly appeared around the bone 
border. CBCT images also contained severe scattered noise. 
CBAM-U-Net and original sCTU-Net both reduced scatter 
artifacts and controlled noise to an acceptable level. More- 
over, CBAM-U-Net exhibited a better noise reduction ability 
than sCTU-Net, particularly in high-density areas such as the 
skull. 

Q-Q plots between CBCT and CT images and between 
SCT and CT images from one test patient are shown in Figure 
6. The range of the horizontal coordinates of the Q-Q plots 


was set as the HU value range of the CT image, whereas that 
of the vertical coordinates of the Q-Q plot was set as the HU 
value range of CBCT or SCT. The Q-Q plot of CT-CBCT de- 
viated from the 45° reference line, particularly in high-density 
tissue. Whereas, sCTU-Net corrected most of the errors, al- 
though there were still errors in high-density tissue areas. The 
Q-Q plot of CT-CBAM along the reference line of 45°, re- 
gardless of the soft tissue and bone areas, demonstrated that 
CBAM-SCT images had the same data distribution as the CT 
images. The Q-Q plots show that the CBAM-U-Net network 
exhibited a better noise reduction ability in high-density tis- 
sue areas than the original sCTU-Net network. 
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Fig. 6. Absolute HU value difference of between CT and CBCT, 
SCT images. (left column). Q-Q plot between CT and CBCT im- 
ages, and between CT and U-Net-SCT images, and between CT and 
CBAM-SCT images. (right column) 


B. Single-beam depth dose analysis 


The IDD curve of the proton beam was analyzed using 
matRad. The high-density and soft-tissue areas of the three 
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Fig. 7. (a) CT images high density areas dose distribution of patient 
1; (b) CBCT images high density areas dose distribution of patient 1; 
(c) U-Net-SCT images high density areas dose distribution of patient 
1; (d) CBAM-SCT images high density areas dose distribution of 
patient 1; (e) CT images soft tissue areas dose distribution of patient 
1; (f) CBCT images soft tissue areas dose distribution of patient 1; 
(g) U-Net-SCT images soft tissue areas dose distribution of patient 
1; and (h) CBAM-SCT images soft tissue areas dose distribution of 
patient 1; 


tested patients were selected as the analysis areas. The se- 
lection of these two areas reflected the actual clinical treat- 
ment conditions. Moreover, in dose calculations, the artifacts 
of high-density tissue areas are commonly the cause of large 
dose calculation errors. Figure _ reffig: 7shows the dose dis- 
tribution map for Patient 1 along the beam is shown in Figure 
T: 


The corresponding IDD curves are presented in Figure 8. 
The IDD curves of the CT images were used as the reference 
curves. The IDD curves for the high-density and soft-tissue 
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Fig. 8. (a)Complete IDD curve in high density areas; (b)partial IDD 
curve in high density areas; (c)complete IDD curve in soft tissue 
areas; and (d)partial IDD curve in soft tissue areas; 


areas were compared. 

To show more details of the IDD curve at the Bragg peak 
region, for the IDD curve of the high-density areas, the 
depth 100 mm—145 mm IDD curve was selected for analy- 
sis, and for the IDD curve of the soft tissue areas, the depth 
120 mm—145 mm IDD curve was selected for analysis. Both 
sCTU-Net and CBAM-U-Net exhibited effective noise reduc- 
tion in soft tissue areas, as shown in Figure 8, and their 
IDD profiles were comparable. However, the noise reduc- 
tion effect of CBAM-U-Net was better than that of the origi- 
nal sCTU-Net in high-density areas. The CBAM-U-Net IDD 
curve was closer to the CT’s IDD curve than to that of sCTU- 
Net. To quantify this difference, Table 2 presents the depth at 
the peak of IDD curves for the three patients. 

Because of the image artifacts in CBCT images, the IDD 
curves of the CBCT exhibited considerable inaccuracy com- 
pared to those of CT. Moreover, the results of the IDD curves 
of CBCT cannot reflect the anatomical structure information 
of patients from the perspective of proton radiotherapy. The 
IDD curves of CBAM-SCT and U-Net-SCT exhibited a bet- 
ter degree of matching than that of CBCT. Whereas, those of 
CBAM-SCT and U-Net-SCT had the same profile as the IDD 
curves of CT. According to the IDD curves, the CBAM-U- 
Net network was more effective than the original sCTU-Net 
network in reducing noise in high-density tissues. In addition, 
the results of CBAM-SCT images with a single-energy beam 
from the treatment angle and other key angles reflected tis- 
sue changes at the radiation site and the degree of anatomical 
deformation of the related tissue. 


C. Lateral dose comparison analysis 


In matRad, the dose results of CT were used as a reference. 
CBCT, CBAM-SCT, and U-Net-SCT images were analyzed 
using the 1%/1mm and 3%/3mm y-index criteria in a square 
field to compare the calculation results. The square field size 
was set to 4 cm x 4 cm, and the spot spacing was 2 mm. The 


Table 2. Depth at the peak of the IDD curve in soft and high density tissue of three patients 


CT CBCT U-Net-SCT CBAM-SCT 
Patient! | High density tissue 122.6mm 113.4mm 126.7mm 125.2mm 
Soft tissue 133.4mm 129.3mm 134.4mm  133.9mm 
Patient2 | High density tissue 122.6mm 113.4mm 124.7mm 124.7mm 
Soft tissue 150.7mm 147.7mm 151.8mm — 151.8mm 
Patient3 | High density tissue 123.7mm 119.1mm 126.7mm 126.2mm 
Soft tissue 147.7mm 143.6mm 149.2mm  147.7mm 
experiment employed proton beams with nominal energy of cT CBCT Unet CBAM 
114.5 MeV. The high-density boundary was selected as the o a a 
radiation analysis area because it can assess the difference in E a ne e 
dose distribution between the high-density and soft tissue ar- E =E ow Ë 09 
eas. Figure 9 shows the scanned field lateral dose distribution = weer a, ai a a 
and absolute dose difference of Patient 1 compared with the as a J 
dose distribution in the CT images. The y-index pass rates znd zm zim zba 
of CBCT, CBAM-SCT, U-Net-SCT, and CT for the three pa- (a) (b) (c) (d) 
tients are listed in Table 3. Absolute difference Absolute difference 0 Absolute difference 
Table 3. -y-index pass-rate of CBCT, U-Net-SCT,CBAM-SCT of E & = a || E Rä 
three patients i 
CBCT U-Net-SCT CBAM-SCT - = 5 = 
patient! | 1%/1mm 70.67% 79.03% 82.23% ‘(e) (fy) g 
3%/3mm 93.89% 97.04% 98.09% 
patient2 | 1%/1mm 85.05% 85.23% 85.73% : ee eee i . 
3%/3mm 98.53% 98.57% 98.62% Fig. 9. (a)Dose distribution in CT image square fields of patient 
patient3 |1%/Imm 80.52% 86.88% 88.94% 1; (b)dose distribution in CBCT images square field of patient 1; 
3%/3mm 97.12% 98.77% 99.25% (c)dose distribution in U-Net-SCT image square fields of patient 1; 


A smaller absolute dose difference existed between the 
CBAM-SCT and CT image square fields than that between 
the CBCT, U-Net-SCT, and CT images. In addition, Figure 
9 shows that the absolute dose difference of CBAM-SCT im- 
ages in the bone region was lower. The y-index pass rate 
of CBAM-SCT was superior to that of the original sCTU- 
Net network under the criteria of 1%/1mm and 3%/3mm q- 
index pass rates. In particular, under the 1%/1mm calculation 
condition for patient 1, the y-index pass rate of CBAM-SCT 
was 82.23%, which is considerably higher than that of U-Net- 
SCT’s 79.03%. Noise in higher-density tissues with a higher 
stopping power ratio leads to a greater dose error in radiother- 
apy dose calculation. CBAM-U-Net has a more robust noise 
reduction function in high-density areas; therefore, CBAM- 
SCT images are more suitable for clinical dose calculation 
requirements than the original sCTU-Net network. 

According to the results of the square-field calculation, the 
CBAM-U-Net network can improve the accuracy of the dose 
calculation and eliminate the image artifacts existing in the 
original CBCT images. The y-index pass rate of the SCT im- 
ages generated by the CBAM-U-Net network was higher than 
that produced by the original sCTU-Net network. The ap- 
plication of square-field dose calculations can partially con- 
sidered the influence of anatomical structural changes on the 
original treatment plan and provided additional data support 
for subsequent treatment. 

To further evaluate the effect of image quality on the dose 


(d)dose distribution in CBAM-SCT images square field of patient 
1; (e)absolute dose difference between CBCT and CT image square 
fields of patient 1; (f)absolute dose difference between U-Net-SCT 
and CT image square fields of patient 1; and (g)absolute dose differ- 
ence between CBAM-SCT and CT image square fields of patient 1; 


calculation results, delineation of a simple PTV and OAR was 
implemented. CT, CBCT, U-Net-SCT, and CBAM-SCT used 
the same PTV and OAR contours as in Patient | to elimi- 
nate errors caused by manual delineation. A single-field plan 
was adopted with a 180° gantry angle and 0° couch angle, 
and single-field inverse planning was used to optimize the 
dose distribution. The squared deviation function was used 
for dose optimization in the PTV area and the squared over- 
dosing function was used for dose optimization in the OAR. 
The results of the CT images were selected as references, and 
the CBAM-SCT, and U-Net-SCT images were analyzed us- 
ing the 2%/2mm 7-index criterion to compare the dose dis- 
tribution. The absolute dose differences are shown in Figure 
10. 


The y-index pass rate of CBAM-SCT was 88.97%, which 
is better than that of U-Net-SCT and CBCT (88.06% and 
77.01%, respectively). Using the CBAM-U-Net network to 
eliminate CBCT image artifacts may provide a foundation for 
proton-adaptive therapy. 
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Fig. 10. (a)Dose distribution in CT images of patient 1; (b)dose 
distribution in CBCT images of patient 1; (c)dose distribution in U- 
Net-SCT images of patient 1; (d)dose distribution in CBAM-SCT 
images of patient 1; (e)absolute dose difference between CBCT and 
CT images of patient 1; (f)absolute dose difference between U-Net- 
SCT and CT images of patient 1; and (g)absolute dose difference 
between CBAM-SCT and CT images of patient 1; 


IV. DISCUSSION 


The proposed deep learning method was realized by syn- 
thesizing CT images from CBCT images, which have the 
clarity of CT images, while retaining the anatomical structure 
of the CBCT images. By adding CBAM Block modules to 
sCTU-Net, CBAM-U-Net performed better than the original 
sCTU-Net in terms of noise reduction. CT, CBCT and SCT 
images were compared using four image metric parameters. 
It was concluded that the image assessment parameters (Ta- 
ble 1) of CBAM-SCT were superior to those of U-Net-SCT 
and original CBCT. The image difference error map and Q-Q 
plots were applied to image analysis; CBAM-U-Net exhibits 
a better image noise reduction capacity and image correction 
performance in high-density tissues. High-density soft tissue 
areas are commonly treated with radiotherapy. Particularly 
in high-density areas, noise probably leads to serious dose- 
calculation errors. Therefore, the CBAM-U-Net exhibited 
better accuracy in case of the potential CBCT dose calcula- 
tions. 


The IDD curves of a single-energy proton beam were com- 
pared. Although the IDD curve of CBAM was closer to that 
of CT, errors were present in soft tissue and high-density tis- 
sue between the IDD curve peak of CBAM-SCT and CT. 
Therefore, further improvement of image correction will be 
implemented for accurate dose calculation. The results of the 
lateral dose profile analysis showed that CBAM-SCT images 
had a higher y-index pass rate than the U-Net-SCT images 
under the 1%/1mm and 3%/3mm calculation criteria. Ow- 
ing to the selected scanned field at the junction of the soft 
tissue and bone regions, Figure 9 shows that the absolute 
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